The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS
نویسندگان
چکیده
A phase transition is taking place today. The amount of data generated by genome resequencing technologies is so large that in some cases it is now less expensive to repeat the experiment than to store the information generated by the experiment. In the next few years, it is quite possible that millions of Americans will have been genotyped. The question then arises of how to make the best use of this information and jointly estimate the haplotypes of all these individuals. The premise of this article is that long shared genomic regions (or tracts) are unlikely unless the haplotypes are identical by descent. These tracts can be used as input for a Clark-like phasing method to obtain a phasing solution of the sample. We show on simulated data that the algorithm will get an almost perfect solution if the number of individuals being genotyped is large enough and the correctness of the algorithm grows with the number of individuals being genotyped. We also study a related problem that connects copy number variation with phasing algorithm success. A loss of heterozygosity (LOH) event is when, by the laws of Mendelian inheritance, an individual should be heterozygote but, due to a deletion polymorphism, is not. Such polymorphisms are difficult to detect using existing algorithms, but play an important role in the genetics of disease and will confuse haplotype phasing algorithms if not accounted for. We will present an algorithm for detecting LOH regions across the genomes of thousands of individuals. The design of the long-range phasing algorithm and the loss of heterozygosity inference algorithms was inspired by our analysis of the Multiple Sclerosis (MS) GWAS dataset of the International Multiple Sclerosis Genetics Consortium. We present similar results to those obtained from the MS data.
منابع مشابه
The Effect of Injection Timing and Phasing on the Emission of a Gasoline Single Cylinder Engine
Performance evaluation of Internal Combustion Engines (ICEs) and setting different emission standards has manifested the importance of pollution reduction as well as the optimal fuel consumption of these engines. Accordingly, the Engine Management Systems (EMS) are utilized which resulted in optimizing the power alongside the decrease in pollutant emission, through preparing the appropriate air...
متن کاملAccelerating Haplotype-Based Genome-Wide Association Study Using Perfect Phylogeny and Phase-Known Reference Data
The genome-wide association study (GWAS) has become a routine approach for mapping disease risk loci with the advent of large-scale genotyping technologies. Multi-allelic haplotype markers can provide superior power compared with single-SNP markers in mapping disease loci. However, the application of haplotype-based analysis to GWAS is usually bottlenecked by prohibitive time cost for haplotype...
متن کاملWhatsHap: fast and accurate read-based phasing
Correspondence: [email protected] Center for Bioinformatics, Saarland University, Campus E2.1, 66123, Saarbrücken, Germany Max Planck Institute for Informatics, Saarbrücken, Germany Full list of author information is available at the end of the article †Equal contributor Abstract Read-based phasing allows to reconstruct the haplotype structure of a sample purely from sequencing reads. ...
متن کاملتحلیل و طراحی تغییر دهنده فاز N - بیتی MEMS توزیع شده در باند Ka
Modern microwave and millimeter wave phased array antennas are attractive because of their ability to steer wave beams in space without physically moving the antenna element. A typical phased array antenna may have several thousand elements fed by a phase shifter for every antenna, which can steer the resulting array beam to different directions. Their low loss, low cost and lightweight phase s...
متن کاملروابط جدید زمان چرخه بهینه برای تقاطعهای پیشزمانبندی شده مستقل با تغییر رابطه وبستر براساس روش HCM 2000
When the degree of saturation at intersection approaches one, Webster’s optimum cycle length equation becomes inapplicable, because the cycle length will becomes very big when the degree of saturation approaches one and will be fully unrealistic when the degree of saturation becomes greater than one. This is not a problem for HCM2000 method. But optimum cycle length calculation in this method h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 18 3 شماره
صفحات -
تاریخ انتشار 2010